专利摘要:
DECODING OF MULTI-CHANNEL AUDIO ENCODED BIT FLOWS USING ADAPTIVE HYBRID TRANSFORMATION. the present invention relates to the processing efficiency of a process used to decode frames from an enhanced AC-3 bit stream which is improved by processing each audio block in a frame only once. The audio blocks of encoded data are decoded in block order instead of channel order. Exemplary decoding processes for enhanced bitstream encoding features are disclosed, such as adaptive hybrid transform processing and spectral extension.
公开号:BR112012013745B1
申请号:R112012013745-0
申请日:2010-10-28
公开日:2020-10-27
发明作者:Kamalanathan Ramamoorthy
申请人:Dolby Laboratories Licensing Corporation;
IPC主号:
专利说明:

CROSS REFERENCE TO RELATED ORDERS
[0001] This application claims priority of US Provisional Patent Application 61 / 267,422, filed on December 7, 2009, which is hereby incorporated by reference in its entirety. TECHNICAL FIELD
[0002] The present invention relates in general to audio encoding systems and more specifically refers to methods and devices that decode encoded digital audio signals. BACKGROUND OF THE TECHNIQUE
[0003] The United States Advanced Television Systems Committee (ATSC), Inc., which was formed by member organizations of the Joint Committee on InterSociety Coordination (JCIC), has developed a coordinated group of national Standards for the development of US domestic television services. . These Standards including relevant encoding / decoding audio standards are presented in several documents including Document A / 52B entitled "Digital Audio Compression Standard (AC-3, E-AC-3)," Review B, published on June 14, 2005, which is incorporated herein by reference in its entirety. The audio encoding algorithm specified in Document A / 52B is referred to as "AC-3." An improved version of this algorithm, which is described in Annex E of the document, is referred to as "E-AC-3." These two algorithms are referred to here as "AC-3" and the relevant Standards are referred to here as "ATSC Standards".
[0004] Document A / 52B does not specify much of the design aspects of the algorithm, but instead describes a "bitstream syntax" defining structural and syntactic characteristics of the encoded information that a compatible decoder should be able to decode . Many applications that comply with the ATSC Standards will transmit digital audio information encoded as binary data in a serial mode. As a result, encoded data is often referred to as a bit stream, but other data configurations are permissible. For ease of discussion, the term "bit stream" is used here to refer to an encoded digital audio signal regardless of the format or recording or transmission technique that is used.
[0005] A bit stream that respects the ATSC Standards is configured in a series of "synchronization frames." Each frame is a unit of the bit stream that can be fully decoded into one or more pulse code modulated (PCM) digital audio data channels. Each frame includes "audio blocks" and frame metadata that are associated with the audio blocks. Each of the audio blocks contains encoded audio data representing digital audio samples for one or more audio channels and metadata blocks associated with the encoded audio data.
[0006] Although the details of the algorithmic design are not specified in the ATSC Standards, certain features of the algorithms have been widely adopted by manufacturers of professional and domestic decoding equipment. A universal implementation functionality for decoders that can decode improved AC-3 bit streams generated by E-AC-3 encoders is an algorithm that decodes all data encoded in a frame for a respective channel before decoding data for another channel. This approach was used to improve the performance of implementations in processors with a single integrated circuit having little memory integrated in the circuit because some decoding processes require data for a given channel from each of the audio blocks in a frame. By processing the encoded data in order of the channel, decoding operations can be performed using integrated circuit memory for a particular channel. The decoded data of the channel can subsequently be transferred to memory external to the integrated circuit to free resources integrated in the circuit to the next channel.
[0007] A bit stream that complies with the ATSC Standards can be very complex because a large number of variations are possible. Some examples, mentioned here only briefly, include channel coupling, channel re-ordering, dialog normalization, dynamic range compression, channel mix reduction and block length switching for standardized AC-3 bit streams and multiple independent streams, dependent secondary streams, spectral extension and adaptive hybrid transformation for improved AC-3 bit streams. Details for these features can be obtained from document A / 52B.
[0008] By independently processing each channel, the algorithms required for these variations can be simplified. Subsequent complex processes like filtration by synthesis can be carried out without concern for these variations. Simpler algorithms seemed to provide a benefit by reducing the computational resources needed to process an audio data frame.
[0009] Unfortunately, this approach requires the decoding algorithm to read and examine data in all audio blocks twice. Each iteration of reading and examining audio data blocks in a frame is referred to here as a "pass" over the audio blocks. The first pass performs extensive calculations to determine the location of the audio data encoded in each block. The second pass performs many of these same calculations as it performs the decoding processes. Both passages require considerable computational resources to calculate the locations of the data. If the initial pass can be eliminated, it may be possible to reduce the total processing resources needed to decode an audio data frame. DESCRIPTION OF THE INVENTION
[00010] It is an objective of the present invention to reduce the computational resources required to decode an audio data frame in encoded bit streams configured in hierarchical units such as the frames and audio blocks mentioned above. The previous text and the following description refer to encoded bit streams that comply with the ATSC Standards, but the present invention is not limited to use only with these bit streams. The principles of the present invention can be applied to essentially any encoded bit stream that has structural characteristics similar to the frames, blocks and channels used in AC-3 encoding algorithms.
[00011] In accordance with an aspect of the present invention, a method decodes a frame from an encoded digital audio signal by receiving the frame and examining the encoded digital audio signal in a single pass to decode the encoded audio data for each block of audio in order by block. Each frame comprises frame metadata and a plurality of audio blocks. Each audio block comprises block metadata and encoded audio data for one or more audio channels. The block metadata comprises control information describing encoding tools used by an encoding process that produced the encoded audio data. One of the coding tools is hybrid transformation processing that applies a bank of analysis filters implemented by a main transform to one or more audio channels to generate spectral coefficients representing the spectral content of one or more audio channels and apply a secondary transformation to spectral coefficients for at least some of the one or more audio channels to generate hybrid transform coefficients. The decoding of each audio block determines whether the encoding process used adaptive hybrid transform processing to encode any of the encoded audio data. If the coding process used adaptive hybrid transform processing, the method obtains all coefficients of the hybrid transform for the frame of the audio data encoded in the first audio block in the frame and applies an inverse secondary transform to the transform coefficients. hybrid to obtain inverse secondary transform coefficients and obtain spectral coefficients of the inverse secondary transform coefficients. If the encoding process did not use adaptive hybrid transform processing, the spectral coefficients are obtained from the audio data encoded in the respective audio block. An inverse main transform is applied to the spectral coefficients to generate an output signal representing one or more channels in the respective audio block.
[00012] The various characteristics of the present invention and its preferred embodiments can be better understood with reference to the following discussion and the accompanying drawings in which similar reference numbers refer to similar elements in the various figures. The content of the following discussion and the drawings are shown as examples only and are not to be construed as representing limitations on the scope of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS
[00013] Figure 1 is a schematic block diagram of exemplary implementations of an encoder.
[00014] Figure 2 is a schematic block diagram of exemplary implementations of a decoder.
[00015] Figures 3A and 3B are schematic illustrations of frames in bit streams respecting normalized and improved syntactic structures.
[00016] Figures 4A and 4B are schematic illustrations of audio blocks that respect standard and improved syntactic structures.
[00017] Figures 5A to 5C are schematic illustrations of exemplary bit streams carrying data with program and channel extensions.
[00018] Figure 6 is a schematic block diagram of an exemplary process implemented by a decoder that processes audio data encoded by channel order.
[00019] Figure 7 is a schematic block diagram of an exemplary process implemented by a decoder that processes audio data encoded in block order.
[00020] Figure 8 is a schematic block diagram of a device that can be used to implement various aspects of the present invention. MODES FOR CARRYING OUT THE INVENTION A. Overview of the Coding System
[00021] Figures 1 and 2 are schematic block diagrams of exemplary implementations of an encoder and a decoder for an audio encoding system in which the decoder can incorporate various aspects of the present invention. These implementations respect what is disclosed in document A / 52B mentioned above.
[00022] The purpose of the encoding system is to generate an encoded representation of incoming audio signals that can be recorded or transmitted and subsequently decoded to produce output audio signals that sound essentially identical to incoming audio signals, using a minimal amount of digital information to represent the encoded signal. The encoding systems that respect the basic ATSC Standards are capable of encoding and decoding information that can represent from one to the so-called 5.1 channels of audio signals, where 5.1 means five channels that can carry total bandwidth signals and one limited bandwidth channel that is designed to carry signals for low frequency (LFE) purposes.
[00023] The following sections describe implementations of the encoder and decoder, and some details of the encoded structure of the bit stream and related encoding and decoding processes. These descriptions are provided so that various aspects of the present invention can be described more succinctly and more clearly understood. 1. Encoder
[00024] With reference to the exemplary implementation in figure 1, the encoder receives a series of samples modulated by pulse code (PCM) representing one or more input channels of audio signals from path 1 of the input signal, and applies a bank 2 from analysis filters to the series of samples to generate digital values representing the spectral composition of the incoming audio signals. For modalities that respect the ATSC Norms, the analysis filter bank is implemented by a Discrete Altered Cosine Transform (MDCT) described in document A / 52B. MDCT is applied to overlapping segments or sample blocks for each audio signal input channel to generate blocks of transform coefficients representing the spectral composition of that input channel signal. MDCT is part of an analysis / synthesis system that uses specially designed window functions and overlap / addition processes to cancel sampling in the time domain. The transform coefficients in each block are expressed in a floating-point block (BFP) form comprising, floating point exponents and mantissas. This description refers to audio data expressed as exponents and floating point mantas because this form of representation is used in bit streams that respect the ATSC Standards; however, this particular representation is merely an example of numerical representations using scaling factors and associated scaled values.
[00025] The BFP exponents for each block collectively provide an approximate spectral envelope for the incoming audio signal. These exponents are encoded by delta modulation and other coding techniques to reduce the information requirements, passed to formatter 5 and introduced in a psychoacoustic model to calculate the psychoacoustic mask threshold of the signal being encoded. The model results are used by the 3-bit allocator to allocate digital information in the form of bits for quantization of mantissas in such a way that the noise level produced by the quantization is kept below the psychoacoustic mask threshold of the signal being encoded. Quantizer 4 quantizes mantissas according to the bit assignments received from allocator 3 of bits and passed to formatter 5.
[00026] Formatter 5 multiplexes or assembles coded exponents, quantized mantissas and other control information, sometimes referred to as block metadata, in audio blocks. The data for six successive audio blocks are assembled in digital information units called frames. The tables themselves also contain control information or table metadata. The information encoded for successive frames is produced as a bit stream along the path 6 to record in an information storage medium or to transmit over a communication channel. For encoders that respect the ATSC Standards, the format of each frame in the bit stream respects the syntax specified in document A / 52B.
[00027] The coding algorithm used by typical coders that respect the ATSC Standards is more complicated than what is illustrated in figure 1 and described above. For example, error detection codes are entered in the frames to allow a receiving decoder to validate the bit stream. A coding technique known as block length switching, sometimes referred to more simply as block switching, can be used to adapt the temporal and spectral definition of the analysis filter bank to optimize its performance with varying signal characteristics. Floating point exponents can be encoded with variable resolution of time and frequency. Two or more channels can be combined into a composite representation using a coding technique known as channel coupling. Another encoding technique known as channel re-recording can be used adaptively for two-channel audio signals. Additional coding techniques can be used in addition to those mentioned here. Some of these other coding techniques are discussed below. Many other details of implementation are omitted because they are not necessary to understand the present invention. These details can be obtained from document A / 52B as desired. 2. Decoder
[00028] The decoder performs a decoding algorithm that is essentially the reverse of the encoding algorithm that is performed on the encoder. With reference to the exemplary implementation in figure 2, the decoder receives an encoded bit stream representing a series of frames of the signal input path 11. The encoded bit stream can be retrieved from an information storage medium or received from a communication channel. Deformator 12 demultiplexes or disassembles the encoded information for each frame into the frame's metadata and six audio blocks. The audio blocks are disassembled in their respective block metadata, coded exponents and quantized mantissas. The encoded exponents are used by a psychoacoustic model in the bit allocator 13 to likewise assign digital information in the form of bits to decantize quantized mantas because bits have been assigned in the encoder. The decantant 14 de-quantizes the quantized mantas according to the bit assignments received from the bit allocator 13 and passes the decantified mantas to the synthesis filter bank 15. The coded exponents are decoded and passed to bank 15 of synthesis filters.
[00029] Decoded exponents and decanted mantles constitute a BFP representation of the spectral content of the audio input signal as encoded by the encoder. The synthesis filter bank 15 is applied to the representation of the spectral content to reconstruct an inaccurate replica of the original audio input signals, which is passed along the path 16 of the output signal. For modalities that respect the ATSC Norms, the synthesis filter bank is implemented by an Inverse Altered Discrete Cosine Transform (IMDCT) described in document A / 52B. IMDCT is part of an analysis / synthesis system briefly mentioned above that is applied to the transform coefficient blocks to generate blocks of audio samples that are overlaid and added to cancel time domain sampling.
[00030] The decoding algorithm used by typical decoders that respect the ATSC Standards is more complicated than what is illustrated in figure 2 and described above. Some decoding techniques that are the reverse of the coding techniques described above include error detection to correct or hide errors, switching the block length to adapt the temporal and spectral definition of the synthesis filter bank, uncoupling channels to recover the channel information of coupled composite representations and matrix operations for retrieving representations of two re-registered channels. Information on other techniques and additional details can be obtained from document A / 52B as desired. B. Coded Bitstream Structure 1. Table
[00031] An encoded bit stream that complies with the ATSC Standards comprises a series of encoded information units called "synchronization frames" which are sometimes referred to more simply as frames. As mentioned above, each frame contains metadata for the frame and six audio blocks. Each audio block contains block metadata and BFP exponents and mantles encoded for a concurrent range of one or more channels of audio signals. The structure for the normalized bitstream is illustrated schematically in figure 3A. The structure for an improved AC-3 bit stream as described in Appendix E of document A / 52B is illustrated in figure 3B. The part of each bit stream within the range marked from SI to CRC is a frame.
[00032] A normalized bit pattern or a special synchronization word is included in the synchronization information (SI) that is provided at the beginning of each frame so that a decoder can identify the beginning of a frame and maintain synchronization of your decoding processes with the encoded bit stream. A bitstream information section (BSI) immediately after the SI contains parameters that are needed by the decoding algorithm to decode the frame. For example, BSI specifies the number, type and order of channels that are represented by the information encoded in the frame, and the dynamic range compression and dialog normalization information being used by the decoder. Each frame contains six blocks (ABO to AB5) of audio, which can be followed by auxiliary data (AUX) if desired. Error detection information in the form of a cyclic redundancy check word (CRC) is provided at the end of each frame.
[00033] A frame in the enhanced AC-3 bit stream also contains frame audio data (AFRM) that contains marks and parameters that refer to additional encoding techniques that are not available for use in encoding a standard bit stream . Some of the additional techniques include the use of spectral extension (SPX), also known as spectral replication and adaptive hybrid transform (AHT). Several coding techniques are discussed below. 2. Audio blocks
[00034] Each audio block contains coded representations of BFP exponents and quantized mantas for 256 transform coefficients, and metadata of the blocks necessary to decode the coded exponents and quantized mantas. This structure is illustrated schematically in figure 4A. The structure for the audio block in an improved AC-3 bit stream as described in Annex E of document A / 52B is illustrated in figure 4B. An audio block structure in an alternative version of the bit stream, as described in Annex D of document A / 52B is not discussed here because its unique characteristics are not relevant to the present invention.
[00035] Some examples of block metadata include marks and parameters for block switching (BLKSW), dynamic range compression (DYNRNG), channel coupling (CPL), channel re-programming (REMAT), exponent encoding technique or strategy or (EXPSTR) used to encode BFP exponents, encoded BFP exponents (EXP), bit allocation information for mantas, adjustments to bit allocation known as delta bit allocation information (DBA) and quantized mantas (MANT ). Each audio block in an enhanced AC-3 bit stream can contain information for additional encoding techniques including spectral extension (SPX). 3. Bit Flow Limitations
[00036] The ATSC Standards impose some limitations on the content of the bit stream that are pertinent to the present invention. Two limitations are mentioned here: (1) the first audio block in the frame, which is referred to as ABO, must contain all the information needed by the decoding algorithm to start decoding all audio blocks in the frame, and (2) always Since the bit stream begins to carry the coded information generated by the channel coupling, the audio block in which the channel coupling is first used must contain all the parameters necessary for decoupling. These features are discussed below. Information on other processes not discussed here can be obtained from document A / 52B. C. Standardized Coding Processes and Techniques
[00037] The ATSC Standards describe numerous syntactic characteristics of the bit stream in terms of encoding processes or "encoding tools" that can be used to generate an encoded bit stream. An encoder does not need to employ all the encoding tools but a decoder that respects the standard must be able to respond to the encoding tools that are deemed essential for compliance. This response is implemented by running an appropriate decoding tool that is essentially the reverse of the corresponding encoding tool.
[00038] Some of the decoding tools are particularly relevant to the present invention because their use or absence influences how aspects of use of the present invention are to be implemented. Some decoding processes and decoding tools are discussed briefly in the following paragraphs. The following descriptions are not intended to be a complete description. Various details and optional features are omitted. The descriptions are only intended to provide a high-level introduction to those unfamiliar with the techniques and to refresh the memory of those who may have forgotten what techniques these terms describe.
[00039] If desired, additional details can be obtained from document A / 52B and from US patent 5583962 entitled "Encoder / Decoder for Multi-Dimensional Sound Fields" by Davis et al., Published on December 10, 1996 and is incorporated herein by reference in its entirety. 1. Unpacking the Bitstream
[00040] All Decoders must unpack or de-multiplex the encoded bit stream to obtain encoded parameters and data. This process is represented by the deformator 12 discussed above. This process is essentially one that reads data in the affluent bit stream and copies parts of the bit stream to registers, copies parts to memory locations, or stores indicators or other references in data in the bit stream that are stored in a buffer memory. The memory is required to store the data and pointers and a compromise can be made between storing this information for later use or rereading the bit stream to obtain the information whenever necessary. 2. Decoding the Exponent
[00041] The values of all BFP exponents are necessary to unpack the data in the audio blocks for each frame because these values indirectly indicate the numbers of bits that are allocated to the quantized mantissas. The exponent values in the bit stream are encoded, however, by differential encoding techniques that can be applied over time and frequency. As a result, the data representing the encoded exponents must be unpacked from the bit stream and decoded before they can be used for other decoding processes. 3. Bit allocation processing
[00042] Each of the BFP mantas quantized in the bit stream is represented by a variable number of bits that are a function of the BFP exponents and possibly other metadata contained in the bit stream. BFP exponents are introduced in a specified model, which calculates a bit allocation for each mantissa. If an audio block also contains delta bit allocation (DBA) information, this additional information is used to adjust the bit allocation calculated by the model. 4. Processing of Mantissa
[00043] Quantized BFP mantles constitute most of the data in an encoded bit stream. The bit allocation is used to determine the location of each mantissa in the bit stream to unpack, as well as to select the appropriate decanting function to obtain the decanted mantissas. Some data in the bit stream can represent multiple mantissas by a single value. In this situation, an appropriate number of mantissas is derived from the single value. Mantissas that have an assignment equal to zero can be reproduced with a value equal to zero or as a pseudo-random number. 5. Decoupling Channels
[00044] The channel coupling encoding technique allows an encoder to represent multiple audio channels with less data. The technique combines spectral components from two or more selected channels, referred to as the coupled channels, to form a single channel of composite spectral components, referred to as the coupling channel. The spectral components of the coupling channel are represented in BFP format. A group of scale factors describing the energy difference between the coupling channel and each coupled channel, known as the coupling coordinates, is derived for each of the coupled channels and included in the encoded bit stream. Coupling is only used for a specified part of each channel's bandwidth.
[00045] When channel coupling is used, as indicated by parameters in the bit stream, a decoder uses a decoding technique known as channel decoupling to derive an inaccurate replica of the BFP exponents and mantissas for each coupled channel of the spectral components of the coupling channel and coupling coordinates. This is done by multiplying each spectral component of the coupled channel by the appropriate coupling coordinate. Additional details can be obtained from document A / 52B. 6. Rematrization of Channels
[00046] The channel re-coding technique allows an encoder to represent signals from two channels with less data using a matrix to convert two independent audio channels into the addition and difference channels. The BFP exponent and mantles normally packaged in a bit stream for the left and right audio channels represent, instead, the addition and subtraction channels. This technique can be used, advantageously, when the two channels have a high level of similarity.
[00047] When re-recording is used, as indicated by a mark in the bit stream, a decoder obtains the values representing the two audio channels by applying an appropriate matrix to the addition and subtraction values. Additional details can be obtained from document A / 52B. D. Improved Coding Processes and Techniques
[00048] Annex E of document A / 52B describes characteristics of improved syntax of the AC-3 bit stream that allows the use of additional encoding tools. Some of these tools and related processes are briefly described below. 1. Adaptive Hybrid Transform Processing
[00049] The hybrid adaptive transform coding technique (AHT) provides another tool besides block switching to adapt the temporal and spectral resolution of the analysis and synthesis filter banks in response to the variable characteristics of the signal by applying two cascade transforms. Additional information for AHT processing can be obtained from document A / 52B and US patent 7516064 entitled "Adaptive Hybrid Transform for Signal Analysis and Synthesis" by Vinton et al., Published on April 7, 2009 and is incorporated herein by reference in its wholeness.
[00050] The encoders employ a main transform implemented by the MDCT analysis transform mentioned above preceding and cascading with a secondary transform implemented by a Discrete Cosine Transform Type-II (DCT-II). MDCT is applied to the overlapping blocks of audio signal samples to generate spectral coefficients representing the spectral content of the audio signal. The DCT-II can be turned on and off of the signal processing path as desired and, when turned on, it is applied to the overlapping blocks of the MDCT spectral coefficients representing the same frequency to generate coefficients of the hybrid transform. In typical use, the DCT-II is turned on when the audio input signal is judged to be sufficiently stationary because its use significantly increases the effective spectral resolution of the analysis filter bank by decreasing its effective temporal resolution from 256 samples to 1536 samples .
[00051] The decoders employ an inverse main transform implemented by the IMDCT synthesis filter bank mentioned above which is cascaded with an inverse secondary transform implemented by a Type-II Inverse Discrete Cosine Transform (IDCT-II) . IDCT-II is turned on and off the signal processing path in response to the metadata provided by the encoder. When turned on, IDCT-II is applied to the overlapping blocks of the hybrid transform coefficients to obtain coefficients of the reverse secondary transform. The inverse secondary transform coefficients can be spectral coefficients for direct introduction to IMDCT if no other coding tool such as channel coupling or SPX was used. Alternatively, the MDCT spectral coefficients can be derived from the coefficients of the reverse secondary transform if coding tools such as channel coupling or SPX were used. After obtaining spectral MDCT coefficients, IMDCT is applied to the blocks of the spectral MDCT coefficients in a conventional manner.
[00052] AHT can be used with any audio channel including the coupling channel and the LFE channel. A channel that is encoded using AHT uses an alternative bit allocation process and two different types of quantization. One type is vector quantization (VQ) and the second type is adaptive gain quantization (GAQ). The GAQ technique is discussed in US patent 6246345 entitled "Using Gain-Adaptive Quantization and Non-Uniform Symbol Lengths for Im-proved Audio Coding" by Davidson et al., Published on June 12, 2001 and is incorporated herein by reference in its wholeness.
[00053] The use of AHT requires a decoder to derive various parameters from the information contained in the encoded bit stream. Document A / 52B describes how these parameters can be calculated. A set of parameters specifies the number of times that BFP exponents are carried in a frame and are derived by examining the metadata contained in all audio blocks in a frame. Two other sets of parameters identify which BFP mantas are quantized using GAQ and provide gain control words for quantizers and are derived by examining metadata for a channel in an audio block.
[00054] All the coefficients of the hybrid transform for AHT are carried in the first audio block, ABO, of a frame. If AHT is applied to a coupling channel, the coordinates of the coupling for the AHT coefficients are distributed across all audio blocks in the same way as for the coupled channels without AHT. A process for resolving this situation is described below. 2. Processing by Spectral Extension
[00055] The spectral encoding technique by spectral extension (SPX) allows an encoder to reduce the amount of information needed to encode a channel from the total bandwidth by excluding high-frequency spectral components from the encoded bit stream and having the decoder synthesizing the missing spectral components from low frequency spectral components that are contained in the encoded bit stream.
[00056] When SPX is used, the decoder synthesizes missing spectral components by copying low-frequency MDCT coefficients instead of high-frequency MDCT coefficients, adding pseudo-random values or noise to the copied transform coefficients, and scaling the amplitude according to an envelope spectral SPX included in the encoded bit stream. The encoder calculates the SPX spectral envelope and inserts it into the encoded bit stream whenever the SPX encoding tool is used.
[00057] The SPX technique is typically used to synthesize the highest bands of spectral components for a channel. It can be used in conjunction with channel coupling for a medium frequency range. Additional processing details can be obtained from document A / 52B. 3. Channel and Program Extensions
[00058] The improved AC-3 bit stream syntax allows an encoder to generate an encoded bit stream that represents a single program with more than 5.1 channels (channel length), two or more programs with up to 5.1 channels (length of channels) programs) or a combination of programs with up to 5.1 channels and more than 5.1 channels. Program extension is implemented by multiplexing frames for multiple independent data streams in an encoded bit stream. Channel extension is implemented by multiplexing the frames to one or more secondary dependent data flows that are associated with an independent data flow. In preferred implementations for the extension of the programs, a decoder is informed which program or programs to decode and the decoding process essentially skips or ignores the secondary streams and streams representing the programs that should not be decoded.
[00059] Figures 5A to 5C illustrate three examples of bit streams carrying data with program and channel extensions. Figure 5A illustrates an exemplary bit stream with channel extension. A single P1 program is represented by an independent S0 flow and three associated dependent SSO, SS1 and SS2 flows. An Fn frame for the independent S0 stream is immediately followed by Fn frames for each of the associated dependent secondary SSO to SS3 streams. These frames are followed by the next Fn + 1 frame for the independent S0 stream, which is in turn immediately followed by the Fn + 1 frames for each of the associated dependent SSO to SS2 streams. The improved AC-3 bit stream syntax allows up to eight dependent secondary streams for each independent stream.
[00060] Figure 5B illustrates an exemplary bit stream with program extensions. Each of four programs P1, P2, P3 and P4 is represented by independent flows S0, S1, S2 and S3, respectively. An Fn frame for the independent S0 flow is immediately followed by Fn frames for each of the independent S1, S2 and S3 flows. These tables are followed by the following Fn + 1 table for each of the independent flows. The improved AC-3 bit stream syntax must have at least one independent stream and allows up to eight independent streams.
[00061] Figure 5C illustrates an exemplary bit stream with program length and channel length. Program P1 is represented by data in independent S0 flow and program P2 is represented by data in independent S1 flow and associated dependent secondary SSO and SS1 flows. An Fn frame for the independent S0 flow is immediately followed by the Fn frame for the independent S1 flow which, in turn, is immediately followed by the Fn frames for the associated dependent secondary SSO and SS1 flows. These tables are followed by the following Fn + 1 table for each of the independent flows and dependent secondary flows.
[00062] An independent stream with no channel extension contains data that can represent up to 5.1 independent audio channels. An independent flow with channel extension, that is, an independent flow that has one or more associated secondary flows associated, contains data that represent a downmix for 5.1 channels of all channels for the program. The term "downmix" refers to a combination of channels on the smallest number of channels. This is done for compatibility with decoders that do not decode dependent secondary streams. The dependent secondary flows contain data representing channels that replace or supplement the channels carried in the associated independent flow. The channel extension allows up to fourteen channels for a program.
[00063] Additional details on the bitstream syntax and associated processing can be obtained from document A / 52B. E. Block Priority Processing
[00064] Complex logic is required to correctly process and decode many variations in the bitstream structure that occur when various combinations of encoding tools are used to generate the encoded bitstream. As mentioned above, the details of the algorithmic design are not specified in the ATSC Standards, but a universal component of the conventional implementation of E-AC-3 decoders is an algorithm that decodes all data in one frame for a respective channel before decoding data for another channel. This traditional approach reduces the amount of integrated circuit memory required to decode the bit stream, but it also requires multiple passes over the data in each frame to read and examine data in all the audio blocks in the frame.
[00065] The traditional approach is illustrated schematically in figure 6. Component 19 logically analyzes frames from a coded bit stream received from path 1 and extracts data from the frames in response to the control signals received from path 20. The logical analysis is performed by multiple passes over the data in the table. The data extracted from a frame is represented by the boxes below component 19. For example, the box with the label AB0-CH0 represents data extracted for channel 0 in the ABO audio block and the box with the label AB5-CH2 represents data extracted to channel 2 in the AB5 audio block. Only three channels 0 to 2 and three audio blocks 0, 1 and 5 are illustrated to simplify the design. Component 19 also passes parameters obtained from the frame's metadata along path 20 to channel processing components 31,32 and 33. The signal paths and rotary switches to the left of the data boxes represent the logic performed by traditional decoders to process encoded audio data in order by channel. The channel processing component 31 receives encoded and metadata audio data via the rotary switch 21 for the CHO channel, starting with the ABO audio block and ending with the AB5 audio block, decodes the data and generates an output signal by applying a bank of synthesis filters to the decoded data. The results of its processing are passed along path 41. Channel processing component 32 receives data for channel CH1 for blocks ABO to AB5 through the rotary switch 22, processes the data and passes its output along from path 42. Channel processing component 33 receives data for channel CH2 for blocks ABO to AB5 through the rotary switch 23, processes the data and passes its output along path 43.
[00066] Applications of the present invention can improve processing efficiency by eliminating multiple passes over the frame data in many situations. Multiple passages are used in some situations when certain combinations of encoding tools are used to generate the encoded bit stream; however, the improved AC-3 bit streams generated by the combinations of encoding tools discussed below can be decoded with a single pass. This new approach is illustrated schematically in figure 7. Component 19 logically analyzes frames from an encoded bit stream received from path 1 and extracts data from the frames in response to control signals received from path 20. In many situations, logical analysis is performed by a single pass over the data in the table. The data extracted from a table are represented by the boxes below component 19, in the same way discussed above for figure 6. Component 19 passes the parameters obtained from the metadata in the table along path 20 to the block that processes the components 61, 62 and 63. Block processing component 61 receives data and encoded audio metadata via the rotary switch 51 for all channels in the ABO audio block, decodes the data and generates an output signal by applying a bank of filter filters. synthesis to decoded data. The results of its processing for the CHO, CH1 and CH2 channels are passed through the rotary switch 71 to the appropriate output path 41, 42 and 43, respectively. The block processing component 62 receives data for all channels in the audio block AB1 via the rotary switch 52, processes the data and passes its output through the rotary switch 72 to the appropriate output path for each channel. The block processing component 63 receives data for all channels in the audio block AB5 via the rotary switch 53, processes the data and passes its output through the rotary switch 73 to the appropriate output path for each channel.
[00067] Various aspects of the present invention are discussed below and illustrated with program fragments. These program fragments are not intended to be practical or optimal implementations, but only illustrative examples. For example, the order of the program's instructions can be changed by exchanging some of the instructions. 1. General process
[00068] A high level illustration of the present invention is shown in the following program fragment: (1. (1) determines the start of a frame in the S bit stream (1. (2) for each N frame in the S stream of bits (1. (3) unpacks metadata in frame N (1. (4) takes parameters from unpacked metadata in frame (1. (5) determines the start of the first audio K block in frame N (1. (6) for audio block K in frame N (1. (7) unpacks metadata in block K (1. (8) takes parameters from the unpacked metadata of block (1. (9) determines the start of the first C channel in block K (1. (10) for channel C in block K (1. (11) unpacks and decodes exponents (1. (12) unpacks and decantifies mantissas (1. (13) applies the synthesis filter to the decoded audio data for channel C ( 1. (14) determines the beginning of channel C + 1 in block K (1. (15) ends for (1. (16) determines the beginning of block K + 1 in frame N (1. (17) ends for ( 1. (18) determines the start of the next N + 1 frame in the S bit stream (1. (19 ) ends for
[00069] The instruction (1.1) searches the bit stream for a string of bits that match the synchronization pattern carried in the information Sl. When the synchronization pattern is found, the start of a frame in the bit stream has been determined.
[00070] Instructions (1.2) and (1.19) control the decoding process being executed for each frame in the bit stream or until the decoding process is stopped by some other means. Instructions (1.3) to (1.18) execute processes that decode a frame in the encoded bit stream.
[00071] Instructions (1.3) to (1.5) unpack metadata in the frame, obtain decoding parameters from the unpacked metadata and determine the location in the bit stream where the data starts for the first audio K block in the frame. Instruction (1.16) determines the start of the next audio block in the bit stream if any subsequent audio blocks are in the frame.
[00072] Instructions (1.6) and (1.17) cause the decoding process to be performed for each audio block in the frame. Instructions (1.7) to (1.15) execute processes that decode an audio block in the frame. Instructions (1.7) to (1.9) unpack metadata in the audio block, obtain decoding parameters from the unpacked metadata and determine where the data starts for the first channel.
[00073] Instructions (1.10) and (1.15) cause the decoding process to be performed for each channel in the audio block. Instructions (1.11) to (1.13) unpack and decode exponents, use decoded exponents to determine the bit allocation to unpack and decantize each quantized mantissa, and apply the synthesis filter bank to the unquantified mantitas. The instruction (1.14) determines the place in the bit stream where the data starts for the next channel, if any subsequent channel is in the frame.
[00074] The structure of the process varies to accommodate the different encoding techniques used to generate the encoded bit stream. Several variations are discussed and illustrated in fragments of the program below. The descriptions of the following program fragments omit some of the detail that is described for the previous program fragment. 2. Spectral Extension
[00075] When spectral extension (SPX) is used, the audio block at which the extension process begins contains shared parameters needed for SPX in the initial audio block as well as another audio block using SPX in the frame. The shared parameters include an identification of the channels participating in the process, the frequency range of the spectral extension and how the SPX spectral envelope for each channel is shared across time and frequency. These parameters are unpacked from the audio block that starts using SPX and stored in memory or in computer registers for use in SPX processing in subsequent audio blocks in the frame.
[00076] It is possible for a frame to have more than one initial audio block for SPX. An audio block starts SPX if the metadata for that audio block indicates that SPX is used and either the metadata for the previous audio block in the frame indicates that SPX is not used, or the audio block is the first block in a frame.
[00077] Each audio block that uses SPX either includes the SPX spectral envelope, referred to as SPX coordinates, which is used for spectral extension processing in those audio blocks or includes a "reuse" tag that indicates the SPX coordinates for a previous block should be used. The SPX coordinates in a block are unpacked and retained for possible reuse by SPX operations on subsequent audio blocks.
[00078] The following program fragment illustrates a way in which audio blocks using SPX can be processed. (2. (1) determines the start of a frame in the S bit stream (2. (2) for each N frame in the S bit stream (2. (3) unpacks metadata in frame N (2. (4) gets parameters of the unpacked metadata of the frame (2. (5) if SPX parameters of the frame are present then unpack the SPX parameters of the frame (2. (6) determines the start of the first audio K block in frame N (2. (7) for audio K block in frame N (2. (8) unpacks metadata in block K (2. (9) gets parameters from unpacked metadata from block (2. (10) if SPX block parameters are present then de-package parameters SPX from block (2. (11) to channel C in block K (2. (12) unpacks and decodes exponents (2. (13) unpacks and decantifies mantissas (2. (14) if channel C uses SPX then (2. (15) extends the bandwidth of channel C (2. (16) ends if (2. (17) applies the synthesis filter to the decoded audio data for channel C (2. (18) determines the beginning of the channel C + 1 in block K (2. (19) ends for (2. (20) determines the beginning of the K + 1 block in frame N (2. (21) ends at (2. (22) determines the beginning of the next N + 1 frame in the stream S of bits (2. (23) ends at
[00079] The instruction (2.5) unpacks SPX frame parameters from the frame metadata if one is present in those metadata. The instruction (2.10) unpacks SPX block parameters from the block metadata if one is present in the block metadata. The SPX parameters of the block can include SPX coordinates for one or more channels in the block.
[00080] Instructions (2.12) and (2.13) unpack and decode exponents and use the decoded exponents to determine the bit allocation to unpack and decantize each quantized mantissa. Instruction (2.14) determines whether channel C in the current audio block uses SPX. If using SPX, instruction (2.15) applies SPX processing to extend the bandwidth of the C channel. This process provides the spectral components for the C channel that are introduced into the synthesis filter bank applied in the instruction (2.17). 3. Adaptive Hybrid Transform
[00081] When an adaptive hybrid transform (AHT) is used, the first ABO audio block in a frame contains all the hybrid transform coefficients for each channel processed by the DCT-II transform. For all other channels, each of the six audio blocks in the frame contains up to 256 spectral coefficients generated by the MDCT analysis filter bank.
[00082] For example, an encoded bit stream contains data for the left, center and right channels. When the left and right channels are processed by AHT and the central channel is not processed by AHT, the ABO audio block contains all the coefficients of the hybrid transform for each of the left and right channels and contains up to 256 spectral MDCT coefficients for the channel central. The blocks AB1 to AB5 of audio do not contain spectral MDCT coefficients for the central channel and no coefficient for the left and right channels.
[00083] The following program fragment illustrates a way in which audio blocks with AHT coefficients can be processed. (3. (1) determines the start of a frame in the S bit stream (3. (2) for each N frame in the S bit stream (3. (3) unpacks metadata in the N frame (3. (4) gets parameters of the unpacked metadata in frame (3. (5) determines the start of the first audio K block in frame N (3. (6) for audio K block in frame N (3. (7) unpacks metadata in block K ( 3. (8) obtains parameters from the unpacked metadata of the block (3. (9) determines the beginning of the first C channel in block K (3. (10) for channel C in block K (3. (11) if AHT is in use for channel C then (3. (12) if K = 0 then (3. (13) unpacks and decodes exponents (3. (14) unpacks and decantifies mantissas (3. (15) applies reverse secondary transform) to exponents and mantissas (3. (16) stores MDCT exponents and mantas in buffer memory (3. (17) ends if (3. (18) obtains MDCT exponents and mantas for block K from buffer memory (3. (19) otherwise (3. (20) unpacks and decodes exponents (3. (21) unpacks esquantiza mantissas (3. (22) ends if (3. (23) applies the synthesis filter to the decoded audio data for channel C (3. (24) determines the start of channel C + 1 in block K (3. (25) ends for (3. (26) determines the start of the K + 1 block in frame N (3. (27) ends for (3. (28) determines the start of the next N + 1 frame in the bit stream S (3. (29) ends to
[00084] Instruction (3.11) determines whether AHT is in use for channel C. If in use, instruction (3.12) determines whether the first ABO audio block is being processed. If the first audio block is being processed, then instructions (3.13) to (3.16) to obtain all AHT coefficients for channel C, apply the secondary reverse transform or IDCT-II to the AHT coefficients to obtain spectral and MDCT coefficients store them in a buffer memory. These spectral coefficients correspond to the exponents and the non-quantized mantissas that are obtained by the instructions (3.20) and (3.21) for the channels for which AHT is not in use. The instruction (3.18) obtains the exponents and mantissas of the spectral MDCT coefficients that correspond to the audio K block being processed. If the first block (K = 0) of audio is being processed, for example, then the exponents and mantissas for the group of spectral MDCT coefficients for the first block are obtained from the buffer memory. If the second block (K = 1) of audio is being processed, for example, then the exponents and mantissas for the group of spectral MDCT coefficients for the second block are obtained from the buffer memory. 4. Spectral Extension and Adaptive Hybrid Transform
[00085] SPX and AHT can be used to generate encrypted data for the same channels. The logic discussed above separately for spectral extension and hybrid transform processing can be combined to process channels for which SPX is in use, AHT is in use, or SPX and AHT are in use.
[00086] The following program fragment illustrates how audio blocks with SPX and AHT coefficients can be processed. (4. (1) beginning of a frame in the S bit stream (4. (2) for each N frame in the S bit stream (4. (3) unpacks metadata in frame N (4. (4) obtains parameters from the unpacked frame metadata (4. (5) if SPX frame parameters are present then de-packaged SPX frame parameters (4. (6) determines the start of the first audio K block in frame N (4. (7) for audio block K in frame N (4. (8) unpacks metadata in block K (4. (9) obtains parameters from unpacked metadata from block (4. (10) if SPX block parameters are present then unpacks parameters from the SPX block (4. (11) to channel C in block K (4. (12) if AHT in use for channel C then (4. (13) if K = 0 then (4. (14) unpack and decodes exponents (4. (15) unpacks and decantifies mantissas (4. (16) applies inverse secondary transform to exponents and mantissas (4. (17) stores exponents and mantissas of the reverse secondary transform in buffer memory (4. (18 ) ends if (4. (19) gets expo entities and mantas of the inverse secondary transform for the K block of the buffer memory (4. (20) otherwise (4. (21) unpacks and decodes exponents (4. (22) unpacks and decantifies mantissas (4. (23) ends if ( 4. (24) if channel C uses SPX then (4. (25) extends the bandwidth of channel C (4. (26) ends if (4. (27) applies the synthesis filter to audio data decoded for channel C (4. (28) determines the beginning of channel C + 1 in block K (4. (29) ends for (4. (30) determines the beginning of block K + 1 in frame N (4. (31) ends for (4. (32) determines the start of the next N + 1 frame in the S bit stream (4. (33) ends for
[00087] The instruction (4.5) unpacks SPX parameters from the frame of the frame metadata if one is present in those metadata. The instruction (4.10) unpacks SPX parameters from the block's metadata if any are present in the block's metadata. The SPX parameters of the block can include SPX coordinates for one or more channels in the block.
[00088] Instruction (4.12) determines whether AHT is in use for channel C. If AHT is in use for channel C, instruction (4.13) determines whether this is the first audio block. If it is the first audio block, instructions (4.14) through (4.17) obtain all the AHT coefficients for channel C, apply the reverse secondary transform or IDCT-II for the AHT coefficients to obtain coefficients of the reverse secondary transform, and store them in a buffer memory. The instruction (4.19) obtains the exponents and mantissas of the coefficients of the secondary reverse transform that correspond to the audio block K being processed.
[00089] If AHT is not in use for channel C, instructions (4.21) and (4.22) unpack and obtain the exponents and mantles for channel C in block K as discussed above for instructions (1.11) and (1.12) program.
[00090] Instruction (4.24) determines whether channel C in the current audio block uses SPX. If using SPX, instruction (4.25) applies SPX processing to the inverse secondary transform coefficients to extend the bandwidth, thereby obtaining the spectral MDCT coefficients of the C channel. This process provides the spectral components for the C channel that are entered in the synthesis filter bank applied in the instruction (4.27). If SPX processing is not used for channel C, the spectral MDCT coefficients are obtained directly from the coefficients of the reverse secondary transform. 5. Coupling and Adaptive Hybrid Transform
[00091] Channel coupling and AHT can be used to generate encoded data for the same channels. Essentially the same logic discussed above for spectral extension and hybrid transform processing can be used to process bit streams using channel coupling and AHT because the SPX processing details discussed above apply to the processing performed for channel coupling.
[00092] The following program fragment illustrates a way in which coupled audio blocks and AHT coefficients can be processed. (5. (1) start of a frame in the S bit stream (5. (2) for each N frame in the S bit stream (5. (3) unpacks metadata in the N frame (5. (4) obtain parameters of the unpacked metadata from the frame (5. (5) if the frame coupling parameters are present then unpack the frame parameters from the coupling (5. (6) determines the start of the first audio K block in frame N (5. (7) for audio block K in frame N (5. (8) unpacks metadata in block K (5. (9) obtains parameters from unpacked metadata from block (5. (10) if block coupling parameters are present if unpacking parameters coupling of block (5. (11) to channel C in block K (5. (12) if AHT is not used for channel C then (5. (13) if K = 0 then (5. (14) unpacks and decodes exponents (5. (15) unpacks and decantifies mantissas (5. (16) applies inverse secondary transform to exponents and mantissas (5. (17) stores exponents and mantissas of the reverse secondary transform in buffer memory (5 (18) ends if (5. (19) obtains exponents and mantissas of the inverse secondary transform for the K block of the buffer memory (5.20) otherwise (5. (21) unpacks and decodes exponents for the C channel (5. (22 ) unpacks and decantifies mantissas for channel C (5. (23) ends if (5. (24) if channel C uses coupling then (5. (25) if channel C is first channel to use coupling then ( 5. (26) if AHT is not used for the coupling channel then (5. (27) if K = 0 then (5. (28) unpacks and decodes exponents of the coupling channel (5. (29) unpacks and decantifies mantissas of the coupling channel (5. (30) applies reverse secondary transform to the coupling channel (5. (31) stores exponents and mantissas of the reverse secondary transform of the coupling channel in the buffer memory (5. (32) ends if (5 . (33) obtains exponents and mantissas of the coupling channel for the K block of the buffer memory (5.34) otherwise (5. (35) unpacks and decodes exponents of the coupling channel (5. (36) d unpacks and decouples mantissas from the coupling channel (5. (37) ends if (5. (38) ends if (5. (39) gets the coupled C channel from the coupling channel (5. (40) ends if ( 5. (41) applies the synthesis filter to the audio data decoded for channel C (5. (42) determines the beginning of channel C + 1 in block K (5. (43) ends for (5. (44) determines the start of the K + 1 block in frame N (5. (45) ends for (5. (46) determines the start of the next N + 1 frame in the stream S of bits (5. (47) ends for
[00093] Instruction (5.5) unpacks channel coupling parameters from the frame's metadata if one is present in those metadata. Instruction (5.10) unpacks channel coupling parameters from the block metadata if one is present in the block metadata. If present, coupling coordinates are obtained for the channels coupled to the block.
[00094] Instruction (5.12) determines whether AHT is in use for channel C. If AHT is in use, instruction (5.13) determines whether it is the first audio block. If it is the first audio block, the instructions (5.14) to (5.17) obtain all the AHT coefficients for channel C, apply the reverse secondary transform or IDCT-II to the AHT coefficients to obtain the reverse secondary transform coefficients and store them. us in a buffer memory. The instruction (5.19) obtains the exponents and mantissas of the coefficients of the secondary reverse transform that correspond to the audio block K being processed.
[00095] If AHT is not in use for channel C, instructions (5.21) and (5.22) unpack and obtain the exponents and mantissas for channel C in block K as discussed above for instructions (1.11) and (1.12 ) from the program.
[00096] Instruction (5.24) determines whether channel coupling is in use for channel C. If in use, instruction (5.25) determines whether channel C is the first channel in the block to use the coupling. If it is, the exponents and mantles for the coupling channel are obtained either by applying an inverse secondary transform to the exponents and mantles of the coupling channel as shown in instructions (5.26) to (5.33) or a start from data in the bit stream as shown in instructions (5.35) and (5.36). The data representing the couplings of the coupling channel are placed in the bit stream immediately after the data representing mantissas of the channel C. The instruction (5.39) derives the coupled channel C from the coupling channel using the appropriate coupling coordinates for the channel. C. If channel coupling is not used for channel C, the spectral MDCT coefficients are obtained directly from the coefficients of the reverse secondary transform. 6. Spectral Extension, Coupling and Adaptive Hybrid Transform
[00097] Spectral extension, channel coupling and AHT can all be used to generate encoded data for the same channels. The logic discussed above for AHT processing combinations with spectral extension and coupling can be combined to process the channels using any combination of the three coding tools incorporating the additional logic needed to manage eight possible situations. Processing for uncoupling channels is performed before performing SPX processing. F. Implementation
[00098] Devices that incorporate various aspects of the present invention can be implemented in a variety of ways, including software for execution by a computer or some other device including more specialized components, such as digital signal processing circuits (DSP) coupled to components similar to those found on a general-purpose computer. Figure 8 is a schematic block diagram of a device 90 that can be used to implement aspects of the present invention. Processor 92 provides computing resources. RAM 93 is the system's random access memory (RAM) used by processor 92 to process. ROM 94 represents some form of persistent storage, such as read-only memory (ROM) for storing the programs necessary to operate the device 90 and possibly for various aspects of carrying out the present invention. I / O control 95 represents interface circuits for receiving and transmitting signals through communication channels 1.16. In the modality shown, all the main components of the system are connected to bus 91, which can represent more than one physical or logical bus; however, a bus architecture is not required to implement the present invention.
[00099] In modalities implemented by a general purpose computer system, additional components may be included to interface with devices, such as a keyboard or mouse and screen, and to control a storage device that has a storage medium, such as the magnetic strip or disk, or an optical medium. The storage medium can be used to record instructional programs for operating systems, utilities and applications, and can include programs that implement various aspects of the present invention.
[000100] The functions required to practice various aspects of the present invention can be performed by components that are implemented in a wide variety of modes including discrete logic components, integrated circuits, one or more ASIC and / or program controlled processors. The way in which these components are implemented is not important for the present invention.
[000101] The software implementations of the present invention can be carried by a variety of optical reading means, such as baseband or modulated communication paths across the entire spectrum, including from supersonic to ultraviolet frequencies, or storage media that carry information using essentially any recording technology including magnetic stripe, cards or disc, optical cards or disc, and detectable markings on media including paper.
权利要求:
Claims (6)
[0001]
1. Method for decoding a frame from an encoded digital audio signal, in which: the frame comprises metadata from the frame, a first audio block and one or more subsequent audio blocks; and each of the first and subsequent audio blocks comprises block metadata and encoded audio data for two or more audio channels, where: the encoded audio data comprises scaling factors and scaled values representing spectral content of the two or more audio channels, each scaled value being associated with a respective factor among the scaling factors; and the block metadata comprise control information describing encoding tools used by an encoding process that produced the encoded audio data, the encoding tools including adaptive hybrid transform processing comprising: applying an analysis filter bank implemented by a main transform to two or more audio channels to generate coefficients of the main transform, and apply a secondary transform to the coefficients of the main transform for at least some of the two or more audio channels to generate hybrid transform coefficients; and characterized by the fact that the method comprises the steps of: (1) receiving the encoded digital audio signal frame; and (8) examining the frame's encoded digital audio signal in a single pass to decode the encoded audio data for each audio block in order by block, wherein decoding each respective audio block comprises: (9) determining , for each respective channel of the two or more channels, if the encoding process used adaptive hybrid transform processing to encode any of the encoded audio data; (10) if the encoding process used hybrid transform processing adaptive for the respective channel: (a) if the respective audio block is the first audio block in the table: (i) obtain all the coefficients of the hybrid transform of the respective channel for the frame from the audio data encoded in the first audio block, and (ii) apply an inverse secondary transform to the hybrid transform coefficients to obtain inverse secondary transform coefficients, and (b) obtain primary transform coefficients from the secondary reverse transform coefficients for the respective channel in the respective audio block; (11) if the encoding process did not use adaptive hybrid transform processing for the respective channel, obtain coefficients of the main transform for the respective channel by decoding the encoded audio data in the respective audio block; and (12) applying an inverse main transform to the coefficients of the main transform to generate an output signal representing the respective channel in the respective audio block.
[0002]
2. Method according to claim 1, characterized by the fact that the encoded digital audio signal frame respects the improved syntax of the AC-3 bit stream.
[0003]
3. Method, according to claim 2, characterized by the fact that the encoding tools include processing by spectral extension and the decoding of each respective audio block also comprises: determining whether the decoding process should use extension processing spectral to decode any of the encoded audio data; and if spectral extension processing is to be used, synthesize one or more spectral components of the inverse secondary transform coefficients to obtain coefficients of the main transform with an extended bandwidth.
[0004]
4. Method, according to claim 2 or 3, characterized by the fact that the encoding tools include channel coupling and the decoding of each respective audio block further comprises: determining whether the encoding process used channel coupling to encode any of the encoded audio data; and if the coding process used channel coupling, derive spectral components from the coefficients of the reverse secondary transform to obtain the coefficients of the main transform for the coupled channels.
[0005]
5. Apparatus for decoding a frame of an encoded digital audio signal, characterized by the fact that the apparatus comprises means for carrying out all the steps of any one of claims 1 to 4.
[0006]
6. Storage medium that records a method that is executable by a device to decode a frame of an encoded digital audio signal, characterized by the fact that the method is the method as defined in any of claims 1 to 4.
类似技术:
公开号 | 公开日 | 专利标题
BR112012013745B1|2020-10-27|METHOD FOR DECODING A FRAME FROM AN ENCODED DIGITAL AUDIO SIGNAL, APPLIANCE FOR DECODING A FRAME FROM A CODED DIGITAL AUDIO SIGNAL AND STORAGE MEDIA RECORDING THE METHOD
KR100992081B1|2010-11-04|Conversion of synthesized spectral components for encoding and low-complexity transcoding
BRPI0709235B1|2019-10-15|AUDIO DECODER, AUDIO DECODING METHOD, RECEIVER FOR RECEIVING A N CHANNEL SIGNAL, TRANSMISSION SYSTEM FOR TRANSMITTING AN AUDIO SIGN, METHOD FOR RECEIVING AN AUDIO SIGNAL, METHOD FOR TRANSMITTING AND RECEIVING A SIGNAL SIGNAL READY BY COMPUTER, AND AUDIO PLAYBACK
BR112015007532B1|2021-08-03|ENCODER, DECODER AND METHODS FOR REGRESSIVE COMPATIBLE MULTIRESOLUTION SPATIAL AUDIO OBJECT ENCODING
BRPI0410130B1|2018-06-05|"METHOD AND ENCODER FOR CODING INPUT AUDIO SIGNS, AND METHOD AND DECODER FOR DECODING AN ENCODED SIGNAL"
BRPI0507806B1|2019-06-25|METHOD FOR GENERATING AN OUTPUT SIGNAL, APPARATUS FOR GENERATING AN OUTPUT SIGN AND RECORDING SUPPORT
BR122018069728B1|2019-03-19|EQUIPMENT AND METHOD FOR PROCESSING A MULTI-CHANNEL AUDIO SIGNAL, EQUIPMENT FOR INVERT PROCESSING OF INPUT DATA AND INVERSE PROCESSING METHOD
BRPI0609897A2|2011-10-11|encoder, decoder, method for encoding a multichannel signal, encoded multichannel signal, computer program product, transmitter, receiver, transmission system, methods of transmitting and receiving a multichannel signal, recording and reproducing devices. audio and storage medium
BR112019014125B1|2021-11-16|METHOD AND DECODER FOR DECODING AN ENCODED AUDIO BITS STREAM AND NON- TRANSIENT COMPUTER-READABLE MEDIA
BR112015028914B1|2021-12-07|METHOD AND APPARATUS TO RECONSTRUCT A TIME/FREQUENCY BLOCK OF AUDIO OBJECTS N, METHOD AND ENCODER TO GENERATE AT LEAST ONE WEIGHTING PARAMETER, AND COMPUTER-READable MEDIUM
DK2691951T3|2016-11-14|TRANSFORMATION WITH REDUCED COMPLEXITY OF AN Low-Frequency
TWI470622B|2015-01-21|Reduced complexity transform for a low-frequency-effects channel
BR112015025080B1|2021-12-21|DECODING METHOD AND DECODER TO DECODE TWO AUDIO SIGNALS, ENCODING METHOD AND ENCODER TO ENCODE TWO AUDIO SIGNALS, AND NON-TRANSITORY READY MEDIUM
同族专利:
公开号 | 公开日
ES2463840T3|2014-05-29|
EP2801975A1|2014-11-12|
KR20130116959A|2013-10-24|
AU2010328635B2|2014-02-13|
AR079878A1|2012-02-29|
US8891776B2|2014-11-18|
HRP20140400T1|2014-06-06|
ECSP12012006A|2012-08-31|
EP2510515A1|2012-10-17|
BR112012013745A2|2016-03-15|
EA201270642A1|2012-12-28|
AU2010328635A1|2012-05-17|
CA2779453A1|2011-06-16|
UA100353C2|2012-12-10|
NZ599981A|2014-07-25|
JP5547297B2|2014-07-09|
AP2012006289A0|2012-06-30|
TWI498881B|2015-09-01|
KR101370522B1|2014-03-06|
MX2012005723A|2012-06-13|
MY161012A|2017-03-31|
CN102687198A|2012-09-19|
CN104217724B|2017-04-05|
HN2012000819A|2015-03-16|
SI2510515T1|2014-06-30|
EP2510515B1|2014-03-19|
IL219304D0|2012-06-28|
CO6460719A2|2012-06-15|
CL2012001493A1|2012-10-19|
DK2510515T3|2014-05-19|
PT2510515E|2014-05-23|
EP2706529A2|2014-03-12|
JP2014063187A|2014-04-10|
WO2011071610A1|2011-06-16|
NI201200063A|2013-06-13|
KR20120074305A|2012-07-05|
TW201126511A|2011-08-01|
IL219304A|2015-05-31|
RS53288B|2014-08-29|
AP3301A|2015-06-30|
JP5607809B2|2014-10-15|
CN104217724A|2014-12-17|
CA2779453C|2015-12-22|
CN102687198B|2014-09-24|
PL2510515T3|2014-07-31|
KR101629306B1|2016-06-10|
GT201200134A|2013-08-29|
US9620132B2|2017-04-11|
EP2801975B1|2017-01-04|
EA024310B1|2016-09-30|
US20120243692A1|2012-09-27|
MA33775B1|2012-11-01|
EP2706529A3|2014-04-02|
ZA201203290B|2013-07-31|
TN2012000211A1|2013-12-12|
US20150030161A1|2015-01-29|
HK1170058A1|2013-02-15|
JP2013511754A|2013-04-04|
PE20130167A1|2013-02-16|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

DK0520068T3|1991-01-08|1996-07-15|Dolby Ray Milton|Codes / decoders for multidimensional sound fields|
JPH10340099A|1997-04-11|1998-12-22|Matsushita Electric Ind Co Ltd|Audio decoder device and signal processor|
US6356639B1|1997-04-11|2002-03-12|Matsushita Electric Industrial Co., Ltd.|Audio decoding apparatus, signal processing device, sound image localization device, sound image control method, audio signal processing device, and audio signal high-rate reproduction method used for audio visual equipment|
US6246345B1|1999-04-16|2001-06-12|Dolby Laboratories Licensing Corporation|Using gain-adaptive quantization and non-uniform symbol lengths for improved audio coding|
US7292901B2|2002-06-24|2007-11-06|Agere Systems Inc.|Hybrid multi-channel/cue coding/decoding of audio signals|
US7502743B2|2002-09-04|2009-03-10|Microsoft Corporation|Multi-channel audio encoding and decoding with multi-channel transform selection|
CN1261663C|2002-12-31|2006-06-28|深圳市高科智能系统有限公司|Method for central radio control of entrance guard and door locks and system device|
US7516064B2|2004-02-19|2009-04-07|Dolby Laboratories Licensing Corporation|Adaptive hybrid transform for signal analysis and synthesis|
US9454974B2|2006-07-31|2016-09-27|Qualcomm Incorporated|Systems, methods, and apparatus for gain factor limiting|
US7953595B2|2006-10-18|2011-05-31|Polycom, Inc.|Dual-transform coding of audio signals|
KR101325802B1|2007-02-06|2013-11-05|엘지전자 주식회사|Digital Broadcasting Transmitter, Digital Broadcasting Receiver and System and Method for Serving Digital Broadcasting|
CN101067931B|2007-05-10|2011-04-20|芯晟科技有限公司|Efficient configurable frequency domain parameter stereo-sound and multi-sound channel coding and decoding method and system|
WO2008151755A1|2007-06-11|2008-12-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder for encoding an audio signal having an impulse- like portion and stationary portion, encoding methods, decoder, decoding method; and encoded audio signal|
CN101816191B|2007-09-26|2014-09-17|弗劳恩霍夫应用研究促进协会|Apparatus and method for extracting an ambient signal|
CN101896967A|2007-11-06|2010-11-24|诺基亚公司|An encoder|
EP2107556A1|2008-04-04|2009-10-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio transform coding using pitch correction|US7711123B2|2001-04-13|2010-05-04|Dolby Laboratories Licensing Corporation|Segmenting audio signals into auditory events|
US8948406B2|2010-08-06|2015-02-03|Samsung Electronics Co., Ltd.|Signal processing method, encoding apparatus using the signal processing method, decoding apparatus using the signal processing method, and information storage medium|
US20120033819A1|2010-08-06|2012-02-09|Samsung Electronics Co., Ltd.|Signal processing method, encoding apparatus therefor, decoding apparatus therefor, and information storage medium|
US9130596B2|2011-06-29|2015-09-08|Seagate Technology Llc|Multiuse data channel|
US9697840B2|2011-11-30|2017-07-04|Dolby International Ab|Enhanced chroma extraction from an audio codec|
ES2568640T3|2012-02-23|2016-05-03|Dolby International Ab|Procedures and systems to efficiently recover high frequency audio content|
EP2898506B1|2012-09-21|2018-01-17|Dolby Laboratories Licensing Corporation|Layered approach to spatial audio coding|
TWI618051B|2013-02-14|2018-03-11|杜比實驗室特許公司|Audio signal processing method and apparatus for audio signal enhancement using estimated spatial parameters|
JP6046274B2|2013-02-14|2016-12-14|ドルビー ラボラトリーズ ライセンシング コーポレイション|Method for controlling inter-channel coherence of an up-mixed audio signal|
WO2014126688A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Methods for audio signal transient detection and decorrelation control|
TWI618050B|2013-02-14|2018-03-11|杜比實驗室特許公司|Method and apparatus for signal decorrelation in an audio processing system|
US8804971B1|2013-04-30|2014-08-12|Dolby International Ab|Hybrid encoding of higher frequency and downmixed low frequency content of multichannel audio|
US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|
US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|
TWM487509U|2013-06-19|2014-10-01|杜比實驗室特許公司|Audio processing apparatus and electrical device|
WO2014210284A1|2013-06-27|2014-12-31|Dolby Laboratories Licensing Corporation|Bitstream syntax for spatial voice coding|
EP3044876B1|2013-09-12|2019-04-10|Dolby Laboratories Licensing Corporation|Dynamic range control for a wide variety of playback environments|
CN105659320B|2013-10-21|2019-07-12|杜比国际公司|Audio coder and decoder|
US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|
US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|
US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|
US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|
US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|
CN105280212A|2014-07-25|2016-01-27|中兴通讯股份有限公司|Audio mixing and playing method and device|
US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|
TWI693594B|2015-03-13|2020-05-11|瑞典商杜比國際公司|Decoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element|
US9837086B2|2015-07-31|2017-12-05|Apple Inc.|Encoded audio extended metadata-based dynamic range control|
US10504530B2|2015-11-03|2019-12-10|Dolby Laboratories Licensing Corporation|Switching between transforms|
US10015612B2|2016-05-25|2018-07-03|Dolby Laboratories Licensing Corporation|Measurement, verification and correction of time alignment of multiple audio channels and associated metadata|
US10885921B2|2017-07-07|2021-01-05|Qualcomm Incorporated|Multi-stream audio coding|
法律状态:
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-09-03| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-04-22| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|
2020-07-28| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2020-10-27| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 28/10/2010, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US26742209P| true| 2009-12-07|2009-12-07|
US61/267,422|2009-12-07|
PCT/US2010/054480|WO2011071610A1|2009-12-07|2010-10-28|Decoding of multichannel aufio encoded bit streams using adaptive hybrid transformation|
[返回顶部]